home *** CD-ROM | disk | FTP | other *** search
- Dan,
-
- Your work on the SGML side (as all your work) is much apreciated!
-
- > Date: Thu, 19 Nov 92 04:37:23 CST
- > From: Dan Connolly <connolly@pixel.convex.com>
- >
-
- >
-
- > The thrust to register HTML with the authorities has
- > spurred me to look over the DTD again. I've found some
- > problems.
- >
-
- > 1. Currently the NAME attribute of an anchor is declared
- > as CDATA, i.e. just about anything. There's an SGML thingy
- > called an ID. SGML parsers enforce uniqueness among the
- > IDs of a document. Seems like that's what we want for ID
- > names.
- >
-
- > But an SGML ID has to start with a letter. So all the
- > HTML files that use numbers as anchor names will break.
-
- The enforcement of uniqueness is useful, and it is what we want.
- It is unfortunate that the very same constraint lead to the use of numbers!
- This is a hangup of the NeXT editor (which i still use, as
- until somone makes a more convenient editor!) but we oughtn't to
- worry about it. A future editor could generate Z[0-9]* names.
- We could even specify that Z[0-9]* are related to a NEXTID attribute
- somewhere for the generation of time-unique IDs.
-
- The only neat thing about CDATA is that it would allow a gateway
- to put in something which as come from the data. For example,
- a glossary generator might generate anchors for each term
- whose name equals the term, and then generate index entries
- pointing to that.
-
- What do you think?
-
- > 2. I introduced two tag names when I drafted the DTD:
- > HTML contains the whole document. I defined it
- > so you can omit both the start and the end tags, so it's
- > inferred by SGML parsers. I don't think I can avoid some
- > top-level tag.
- > DOCUMENT contains most of the "body" -- all the
- > headings and paragraphs. I did this to avoid something
- > called mixed content, which causes complications. I
- > could rename this element as BODY, and introduce a
- > omitable HEADING tag to surround the TITLE, NEXTID, and
- > ISINDEX tags.
-
- I like the latter idea. Header and Body fit in well with mail
- nomenclature, wherase "document" is normally the whole thing
- retrieved.
-
- > 3. I stuck anchors in as an inclusion, meaning they could
- > be used just about anywhere. I thought stuff like <a
- > name=foo><h1>Foo</h1></a> was legal, but neither linemode
- > nor the midas browser groks.
-
- The line mdoe doesn't? It should. Only titles I wanted to insist were
- plain ascii text....
- Turns out to be a bug in HTML.c -- fixed for next release.
-
- > I'm editing the DTD to restrict the usage of anchors to
- > only contain text strings.
-
- I don't like that.... I think that especially as we introduce
- highlighting, anchors will want to be general areas of text, so
- long as they are nested properly. (An "SGML attitude" restriction
- which Frank Kappe objected to I recall).
-
- > 4. The OL tag is disappearing. It's no longer documented
- > in the web, and it's not supported by MidasWWW. Should
- > I delete it from the DTD?
-
- You say its useful? If you havce implemented it, and noone else
- objects, then we could put it back in. In principle, with hypertext, you don't
- have to number tyhings, you can refer to them with a link. However, you
- can imagine the abstract difference between an ordered list and
-
- a sack of objects being important. [For example, a list of
- instructions is ordered]. I'll put it into the HTML2 list of features.
- I suggest everyone implement OL as UL in programs which, like the line mode
- browser, can't differentiate.
-
-
- > 5. What about <HP1> thru <HP5>... should we include them?
- > I'd prefer <em>, <tt>, <cite>, ala TeX. Or we could go
- > with the O'Reilly/Hal DocBook tags: <Emphasis>,
- > <OopsChar>, <wordasword>,<CiteBook>,<Subscript>,
- > <Superscript>.
-
- I agree that numbering them is on the verge of useless. The trouble is,
- you never have enough. Why CiteBook but not CiteProgram? etc etc.
- The docbook names are on the long side, aren't they? Not very important
- I suppose.
-
- > 6. Any more thoughts on the BaseAddress tag?
-
- Yes. It should be in. I think. I've mentioned in
- http://info.cern.ch/hypertext/WWW/MarkUp/Future.html
-
- > 7. The HTML tags documentation says Listing sections can
- > contain any ISO Latin 1 characters. The SGML standard
- > mentions ISO 646, i.e. ascii, as the default, but the
- > sgmls parser, the linemode browser, and MidasWWW all seem
- > to grok Latin1 just fine.
-
- I suggest we limit it to ASCII unless something outside the
- document says otherwise, while strongly recommending that
- 8-bit character sets should be handled by the apps. I have
- seen some funnies when two clients both handle 8-bit characters,
- but not the same ones.
-
- Does the SGML standard say how to specify the character set for
- the text?
-
- > Dan
- >
-
- Tim
-
-